{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Estimating the Cost of Equity from Historical Price Data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We want to estimate the cost of equity for a company. We have historical data on its stock prices, as well as prices of a market portfolio. We will estimate the CAPM $\\beta$, and then calculate the CAPM to determine the cost of equity.\n",
    "\n",
    ":As a reminder, the CAPM formula is given by $$r_i = r_f + \\beta (r_m - r_f) + \\epsilon$$\n",
    "\n",
    "## Load in Price Data\n",
    "\n",
    "First let's load in the historical price data. We can use `pandas` to load the Excel file into Python. Ensure that the Excel workbook is in the same folder as your Jupyer notebook.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "df = pd.read_excel('price data.xlsx')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Market Portfolio</th>\n",
       "      <th>Asset Price</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1000.000000</td>\n",
       "      <td>100.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1178.861002</td>\n",
       "      <td>88.056952</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1400.554023</td>\n",
       "      <td>71.554169</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1476.669385</td>\n",
       "      <td>57.621757</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1536.613924</td>\n",
       "      <td>47.839685</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Market Portfolio  Asset Price\n",
       "0       1000.000000   100.000000\n",
       "1       1178.861002    88.056952\n",
       "2       1400.554023    71.554169\n",
       "3       1476.669385    57.621757\n",
       "4       1536.613924    47.839685"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head()  # print the first 5 rows"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Calculating Returns\n",
    "\n",
    "The CAPM works with returns and not prices, so let's convert our prices to returns. Luckily the pandas method `pct_change` handles this for us."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Market Portfolio</th>\n",
       "      <th>Asset Price</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.178861</td>\n",
       "      <td>-0.119430</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.188057</td>\n",
       "      <td>-0.187410</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.054347</td>\n",
       "      <td>-0.194711</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.040594</td>\n",
       "      <td>-0.169764</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Market Portfolio  Asset Price\n",
       "0               NaN          NaN\n",
       "1          0.178861    -0.119430\n",
       "2          0.188057    -0.187410\n",
       "3          0.054347    -0.194711\n",
       "4          0.040594    -0.169764"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "returns = df.pct_change()\n",
    "returns.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The first values are missing (`NaN`) because we can't calculate a return off of a single number."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Calculating the Market Risk Premium\n",
    "\n",
    "We are ultimately going to be running a regression to determine $\\beta$. We can think of a standard regression line as following the equation: $$y = a + bx$$ We can put the CAPM in this format if we assume $\\epsilon$ is zero, then treat $r_i$ as $y$, $r_f$ as $a$, and $(r_m - r_f)$ as $x$. Therefore we need to calculate the market risk premium (MRP), $(r_m - r_f)$, to use in the regression. \n",
    "\n",
    "From the problem, the risk free rate is 3%. So just subtract that from the market returns to get the MRP."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Market Portfolio</th>\n",
       "      <th>Asset Price</th>\n",
       "      <th>MRP</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.178861</td>\n",
       "      <td>-0.119430</td>\n",
       "      <td>0.148861</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.188057</td>\n",
       "      <td>-0.187410</td>\n",
       "      <td>0.158057</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.054347</td>\n",
       "      <td>-0.194711</td>\n",
       "      <td>0.024347</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.040594</td>\n",
       "      <td>-0.169764</td>\n",
       "      <td>0.010594</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Market Portfolio  Asset Price       MRP\n",
       "0               NaN          NaN       NaN\n",
       "1          0.178861    -0.119430  0.148861\n",
       "2          0.188057    -0.187410  0.158057\n",
       "3          0.054347    -0.194711  0.024347\n",
       "4          0.040594    -0.169764  0.010594"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "risk_free = 0.03\n",
    "returns['MRP'] = returns['Market Portfolio'] - risk_free\n",
    "returns.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Calculating $\\beta$\n",
    "\n",
    "Now we are ready to run the regression of stock returns on the MRP. We can use `statsmodels` to run the OLS regression. We will also add a constant to the X variables, to have an intercept in the regression."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "ename": "MissingDataError",
     "evalue": "exog contains inf or nans",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mMissingDataError\u001b[0m                          Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-5-1457d72bb165>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m      4\u001b[0m \u001b[0my\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mreturns\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'Asset Price'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      5\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 6\u001b[0;31m \u001b[0mmodel\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0msm\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mOLS\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0my\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mX\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;32m~/.local/share/virtualenvs/fin-model-course-eIFMSc8A/lib/python3.7/site-packages/statsmodels/regression/linear_model.py\u001b[0m in \u001b[0;36m__init__\u001b[0;34m(self, endog, exog, missing, hasconst, **kwargs)\u001b[0m\n\u001b[1;32m    857\u001b[0m                  **kwargs):\n\u001b[1;32m    858\u001b[0m         super(OLS, self).__init__(endog, exog, missing=missing,\n\u001b[0;32m--> 859\u001b[0;31m                                   hasconst=hasconst, **kwargs)\n\u001b[0m\u001b[1;32m    860\u001b[0m         \u001b[0;32mif\u001b[0m \u001b[0;34m\"weights\"\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_init_keys\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    861\u001b[0m             \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_init_keys\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mremove\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"weights\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/.local/share/virtualenvs/fin-model-course-eIFMSc8A/lib/python3.7/site-packages/statsmodels/regression/linear_model.py\u001b[0m in \u001b[0;36m__init__\u001b[0;34m(self, endog, exog, weights, missing, hasconst, **kwargs)\u001b[0m\n\u001b[1;32m    700\u001b[0m             \u001b[0mweights\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mweights\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msqueeze\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    701\u001b[0m         super(WLS, self).__init__(endog, exog, missing=missing,\n\u001b[0;32m--> 702\u001b[0;31m                                   weights=weights, hasconst=hasconst, **kwargs)\n\u001b[0m\u001b[1;32m    703\u001b[0m         \u001b[0mnobs\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexog\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mshape\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    704\u001b[0m         \u001b[0mweights\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mweights\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/.local/share/virtualenvs/fin-model-course-eIFMSc8A/lib/python3.7/site-packages/statsmodels/regression/linear_model.py\u001b[0m in \u001b[0;36m__init__\u001b[0;34m(self, endog, exog, **kwargs)\u001b[0m\n\u001b[1;32m    188\u001b[0m     \"\"\"\n\u001b[1;32m    189\u001b[0m     \u001b[0;32mdef\u001b[0m \u001b[0m__init__\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mendog\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mexog\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 190\u001b[0;31m         \u001b[0msuper\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mRegressionModel\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__init__\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mendog\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mexog\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    191\u001b[0m         \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_data_attr\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mextend\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'pinv_wexog'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'wendog'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'wexog'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'weights'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    192\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/.local/share/virtualenvs/fin-model-course-eIFMSc8A/lib/python3.7/site-packages/statsmodels/base/model.py\u001b[0m in \u001b[0;36m__init__\u001b[0;34m(self, endog, exog, **kwargs)\u001b[0m\n\u001b[1;32m    234\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    235\u001b[0m     \u001b[0;32mdef\u001b[0m \u001b[0m__init__\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mendog\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mexog\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mNone\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 236\u001b[0;31m         \u001b[0msuper\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mLikelihoodModel\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__init__\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mendog\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mexog\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    237\u001b[0m         \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0minitialize\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    238\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/.local/share/virtualenvs/fin-model-course-eIFMSc8A/lib/python3.7/site-packages/statsmodels/base/model.py\u001b[0m in \u001b[0;36m__init__\u001b[0;34m(self, endog, exog, **kwargs)\u001b[0m\n\u001b[1;32m     75\u001b[0m         \u001b[0mhasconst\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mkwargs\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpop\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'hasconst'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     76\u001b[0m         self.data = self._handle_data(endog, exog, missing, hasconst,\n\u001b[0;32m---> 77\u001b[0;31m                                       **kwargs)\n\u001b[0m\u001b[1;32m     78\u001b[0m         \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mk_constant\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdata\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mk_constant\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     79\u001b[0m         \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexog\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdata\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexog\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/.local/share/virtualenvs/fin-model-course-eIFMSc8A/lib/python3.7/site-packages/statsmodels/base/model.py\u001b[0m in \u001b[0;36m_handle_data\u001b[0;34m(self, endog, exog, missing, hasconst, **kwargs)\u001b[0m\n\u001b[1;32m     98\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     99\u001b[0m     \u001b[0;32mdef\u001b[0m \u001b[0m_handle_data\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mendog\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mexog\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmissing\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mhasconst\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 100\u001b[0;31m         \u001b[0mdata\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mhandle_data\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mendog\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mexog\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmissing\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mhasconst\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    101\u001b[0m         \u001b[0;31m# kwargs arrays could have changed, easier to just attach here\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    102\u001b[0m         \u001b[0;32mfor\u001b[0m \u001b[0mkey\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mkwargs\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/.local/share/virtualenvs/fin-model-course-eIFMSc8A/lib/python3.7/site-packages/statsmodels/base/data.py\u001b[0m in \u001b[0;36mhandle_data\u001b[0;34m(endog, exog, missing, hasconst, **kwargs)\u001b[0m\n\u001b[1;32m    670\u001b[0m     \u001b[0mklass\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mhandle_data_class_factory\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mendog\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mexog\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    671\u001b[0m     return klass(endog, exog=exog, missing=missing, hasconst=hasconst,\n\u001b[0;32m--> 672\u001b[0;31m                  **kwargs)\n\u001b[0m",
      "\u001b[0;32m~/.local/share/virtualenvs/fin-model-course-eIFMSc8A/lib/python3.7/site-packages/statsmodels/base/data.py\u001b[0m in \u001b[0;36m__init__\u001b[0;34m(self, endog, exog, missing, hasconst, **kwargs)\u001b[0m\n\u001b[1;32m     85\u001b[0m         \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mconst_idx\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     86\u001b[0m         \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mk_constant\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;36m0\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 87\u001b[0;31m         \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_handle_constant\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mhasconst\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m     88\u001b[0m         \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_check_integrity\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     89\u001b[0m         \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_cache\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m{\u001b[0m\u001b[0;34m}\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m~/.local/share/virtualenvs/fin-model-course-eIFMSc8A/lib/python3.7/site-packages/statsmodels/base/data.py\u001b[0m in \u001b[0;36m_handle_constant\u001b[0;34m(self, hasconst)\u001b[0m\n\u001b[1;32m    131\u001b[0m             \u001b[0mexog_max\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmax\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexog\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maxis\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    132\u001b[0m             \u001b[0;32mif\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0misfinite\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mexog_max\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mall\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 133\u001b[0;31m                 \u001b[0;32mraise\u001b[0m \u001b[0mMissingDataError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'exog contains inf or nans'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    134\u001b[0m             \u001b[0mexog_min\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmin\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexog\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0maxis\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    135\u001b[0m             \u001b[0mconst_idx\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mwhere\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mexog_max\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0mexog_min\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msqueeze\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;31mMissingDataError\u001b[0m: exog contains inf or nans"
     ]
    }
   ],
   "source": [
    "import statsmodels.api as sm\n",
    "\n",
    "X = sm.add_constant(returns['MRP'])\n",
    "y = returns['Asset Price']\n",
    "\n",
    "model = sm.OLS(y, X)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "But you can see we ran into a problem `MissingDataError`. This is because we had those `NaN`s in the first row. We can remove these easily."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Market Portfolio</th>\n",
       "      <th>Asset Price</th>\n",
       "      <th>MRP</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.178861</td>\n",
       "      <td>-0.119430</td>\n",
       "      <td>0.148861</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.188057</td>\n",
       "      <td>-0.187410</td>\n",
       "      <td>0.158057</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.054347</td>\n",
       "      <td>-0.194711</td>\n",
       "      <td>0.024347</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.040594</td>\n",
       "      <td>-0.169764</td>\n",
       "      <td>0.010594</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>0.076758</td>\n",
       "      <td>0.227258</td>\n",
       "      <td>0.046758</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Market Portfolio  Asset Price       MRP\n",
       "1          0.178861    -0.119430  0.148861\n",
       "2          0.188057    -0.187410  0.158057\n",
       "3          0.054347    -0.194711  0.024347\n",
       "4          0.040594    -0.169764  0.010594\n",
       "5          0.076758     0.227258  0.046758"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "returns = returns.dropna()\n",
    "returns.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Simply by using `.dropna()` we can remove those `NaN`s so we can run the regression. Let's try that again."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table class=\"simpletable\">\n",
       "<caption>OLS Regression Results</caption>\n",
       "<tr>\n",
       "  <th>Dep. Variable:</th>       <td>Asset Price</td>   <th>  R-squared:         </th> <td>   0.225</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Model:</th>                   <td>OLS</td>       <th>  Adj. R-squared:    </th> <td>   0.218</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Method:</th>             <td>Least Squares</td>  <th>  F-statistic:       </th> <td>   28.53</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Date:</th>             <td>Mon, 30 Mar 2020</td> <th>  Prob (F-statistic):</th> <td>5.99e-07</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Time:</th>                 <td>17:47:49</td>     <th>  Log-Likelihood:    </th> <td>  13.883</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>No. Observations:</th>      <td>   100</td>      <th>  AIC:               </th> <td>  -23.77</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Df Residuals:</th>          <td>    98</td>      <th>  BIC:               </th> <td>  -18.56</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Df Model:</th>              <td>     1</td>      <th>                     </th>     <td> </td>   \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Covariance Type:</th>      <td>nonrobust</td>    <th>                     </th>     <td> </td>   \n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "    <td></td>       <th>coef</th>     <th>std err</th>      <th>t</th>      <th>P>|t|</th>  <th>[0.025</th>    <th>0.975]</th>  \n",
       "</tr>\n",
       "<tr>\n",
       "  <th>const</th> <td>    0.0306</td> <td>    0.024</td> <td>    1.301</td> <td> 0.196</td> <td>   -0.016</td> <td>    0.077</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>MRP</th>   <td>    0.8338</td> <td>    0.156</td> <td>    5.341</td> <td> 0.000</td> <td>    0.524</td> <td>    1.144</td>\n",
       "</tr>\n",
       "</table>\n",
       "<table class=\"simpletable\">\n",
       "<tr>\n",
       "  <th>Omnibus:</th>       <td> 1.312</td> <th>  Durbin-Watson:     </th> <td>   2.049</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Prob(Omnibus):</th> <td> 0.519</td> <th>  Jarque-Bera (JB):  </th> <td>   1.114</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Skew:</th>          <td> 0.010</td> <th>  Prob(JB):          </th> <td>   0.573</td>\n",
       "</tr>\n",
       "<tr>\n",
       "  <th>Kurtosis:</th>      <td> 2.483</td> <th>  Cond. No.          </th> <td>    7.37</td>\n",
       "</tr>\n",
       "</table><br/><br/>Warnings:<br/>[1] Standard Errors assume that the covariance matrix of the errors is correctly specified."
      ],
      "text/plain": [
       "<class 'statsmodels.iolib.summary.Summary'>\n",
       "\"\"\"\n",
       "                            OLS Regression Results                            \n",
       "==============================================================================\n",
       "Dep. Variable:            Asset Price   R-squared:                       0.225\n",
       "Model:                            OLS   Adj. R-squared:                  0.218\n",
       "Method:                 Least Squares   F-statistic:                     28.53\n",
       "Date:                Mon, 30 Mar 2020   Prob (F-statistic):           5.99e-07\n",
       "Time:                        17:47:49   Log-Likelihood:                 13.883\n",
       "No. Observations:                 100   AIC:                            -23.77\n",
       "Df Residuals:                      98   BIC:                            -18.56\n",
       "Df Model:                           1                                         \n",
       "Covariance Type:            nonrobust                                         \n",
       "==============================================================================\n",
       "                 coef    std err          t      P>|t|      [0.025      0.975]\n",
       "------------------------------------------------------------------------------\n",
       "const          0.0306      0.024      1.301      0.196      -0.016       0.077\n",
       "MRP            0.8338      0.156      5.341      0.000       0.524       1.144\n",
       "==============================================================================\n",
       "Omnibus:                        1.312   Durbin-Watson:                   2.049\n",
       "Prob(Omnibus):                  0.519   Jarque-Bera (JB):                1.114\n",
       "Skew:                           0.010   Prob(JB):                        0.573\n",
       "Kurtosis:                       2.483   Cond. No.                         7.37\n",
       "==============================================================================\n",
       "\n",
       "Warnings:\n",
       "[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n",
       "\"\"\""
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X = sm.add_constant(returns['MRP'])\n",
    "y = returns['Asset Price']\n",
    "\n",
    "model = sm.OLS(y, X)\n",
    "results = model.fit()\n",
    "results.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can see there is a 0.8338 coefficient on the MRP. This means our $\\beta$ is 0.8338. We can extract that exact number as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.8337833218355808"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "beta = results.params['MRP']\n",
    "beta"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Estimating the Market Return\n",
    "\n",
    "Now we are only missing one component to plug into CAPM to get the cost of equity: the market return. A good way to estimate this is by taking an average of the historical returns. This can also be adjusted for expectations about the economy in the future (recession, etc.)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.0944856620313094"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "market_return = returns['Market Portfolio'].mean()\n",
    "market_return"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Estimating the Cost of Equity\n",
    "\n",
    "Now we can plug everything into the CAPM formula to get the $r_i$ cost of equity. CAPM again: $$r_i = r_f + \\beta (r_m - r_f) + \\epsilon$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The cost of equity is 8.38%.\n"
     ]
    }
   ],
   "source": [
    "cost_of_equity = risk_free + beta * (market_return - risk_free)\n",
    "print(f'The cost of equity is {cost_of_equity:.2%}.')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The Exercise in Excel\n",
    "\n",
    "All the steps of the exercise are the same in Excel. The only difference is the functions/process to run each step. For calculating returns, a simple formula of $(new - old)/old$ can be calculated for one cell and dragged to get all the returns. 3% can be subtracted from the market returns and dragged down to yield the MRPs. The regression can be run by enabling the Data Analysis Toolpak add-in and following the prompts. "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}